Eecient Mining of Emerging Patterns: Discovering Trends and Diierences

نویسندگان

  • Guozhu Dong
  • Jinyan Li
چکیده

We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are deened as itemsets whose supports increase signiic-antly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data classes. EPs have been proven useful: we have used them to build very powerful classiiers, which are more accurate than C4.5 and CBA, for many datasets. We believe that EPs with low to medium support, such as 1%{ 20%, can give useful new insights and guidance to experts, in even \well understood" applications. The eecient mining of EPs is a challenging problem, since (i) the Apriori property no longer holds for EPs, and (ii) there are usually too many candidates for high dimensional databases or for small support thresholds such as 0.5%. Naive algorithms are too costly. To solve this problem, (a) we promote the description of large collections of itemsets using their concise borders (the pair of sets of the minimal and of the maximal itemsets in the collections). (b) We design EP mining algorithms which manipulate only borders of collections (using our multi-border-diierential algorithm), and which represent discovered EPs using borders. All EPs satisfying a constraint can be eeciently discovered by our border-based algorithms, which take the borders, derived by Max-Miner, of large itemsets as inputs. In our experiments on large and high dimensional datasets including the US census and Mushroom datasets, many EPs, including some with large cardinality, are found quickly. We also give other algorithms for discovering general or special types of EPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

Mining of Emerging trends of Covid-19 thematic areas in National and International publications

Background &Aim: The results from the analysis of COVID-19 literature by employing text-mining techniques are of particular importance for researchers, policymakers, and planners of medical sciences at the national and international levels, avoiding parallel research and waste of time and budget. The paper explore emerging topics and the trend of scientific words at the national and internation...

متن کامل

Discovering Unknown Patterns in Free Text

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited. INTRODUCTION A very large percentage of business and academic data is stored in textual format. With the exception of metadata, such as author, date, title and publisher, these data are not overtly structured like the standard, mainly numerical, data in relational databa...

متن کامل

Understanding Temporal Human Mobility Patterns in a City by Mobile Cellular Data Mining, Case Study: Tehran City

Recent studies have shown that urban complex behaviors like human mobility should be examined by newer and smarter methods. The ubiquitous use of mobile phones and other smart communication devices helps us use a bigger amount of data that can be browsed by the hours of the day, the days of the week, geographic area, meteorological conditions, and so on. In this article, mobile cellular data mi...

متن کامل

Trends and patterns of evolution for product innovation

Perhaps the most promising TRIZ tools are trends and pattern of evolution. The idea that technological systems tend to go forward in a way analogous to that of biological systems has been supporting the research of the evolution of several products. Some degree of coincidence to this analogy has been found in several cases using statistical analysis tools in patent databases. This paper starts ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999